Dataset Descriptions - _ChoiceCZAL523

A) Reservation data :

i) Account : Account number for transactional purpose unique code (for cashiering and accounting personnel to reference).

ii) Guest Name : Name of a guest staying at hotel.

iii) Arrive : Guest arrival (entry) date.

iv) Depart : Guest exit date.

v) Nights : Total number of nights (days) a guest is staying.

vi) Status : Room current status

O (Occupied), R (Reserved), I (Inspected), N (No-Show), C (Cancelled) 1) No-Show: A reservation becomes a No-Show when the customer who has a guaranteed reservation does not cancel it before the hotel’s cancellation deadline, and never arrives to claim the reservation.

2) Occupied: Customer booked the room and actually he arrived as per check-in date.

3) Reserved: If the guest has not claimed or canceled the booked room by the specified time with hotel policies, you must hold the reserved rooms available until check-out time the following day.

4) Canclled: Customer has been cancelled the booked room successfully within the time period of cancellation policies.

5) Inspected: The main Purpose of a room inspection is to catch any problems that may have been overlooked during the cleaning before it is found by the guest and becomes a dissatisfaction and complaint. After guest check-out, manager is responsible for the room inspection before it allocated to new customer.

vii) Rate : The price of hotel charges for overnight accommodations (Room rates).

viii) Rate code : Room Rates and rate codes may also vary according to the available room features such as room size, location, view, furnishing, decors, competitors pricing. Similar to below references,

1) Early-bird Rate - This type of rates are only open X days before arrival. Eg: Open only when 7 Days before arrival.

2) Package Rate: Rates that includes a guest room in combination with other available events or activities. ( Eg: Best of London package which includes room rental, all meals, site seeing, airport transfers etc.)

3) Best Available Rates (BAR): These rate codes are the lowest discounted rate available for a day which can be offered to the guest by the Reservation or Front desk staff. BAR can be of different types

BAR Level         Open / Close when Occupancy Between              DATE          DAY      RATE
BAR - 01         0% TO 25 %                                       19/10/2016     WED     200.00
BAR - 02         26 % TO 35 %                                     20/10/2016     THU     200.00
BAR - 03         36% TO 50%                                       21/10/2016     FRI     250.00
BAR - 04         51% TO 75%                                       22/10/2016     SAT     250.00
BAR - 05         76% TO 100%                                      23/10/2016     SUN     250.00
                                                                  24/10/2016     MON     200.00
                                                                  25/10/2016     TUE     200.00

4) Family Rate: A rate reserved for families with children. Usually, these rates include Extra Bed charges and may also include some free add-on activities for children

ix) Room : Respetive room numbers assigned to rooms as per the hotel configurations.

x) Type :

1) NQQ - Double Queen Rooms (Non Smoking 2 Queen bed)

2) NK - Single King Rooms (Non Smoking Single King bed)

3) SNQQ - Double Queen Suites (Suites Non Smoking 2 Queen bed)

4) SNQQ1 - Double Queen Suites (Suites Non Smoking 2 Queen bed)

5) SNK - Single King Suites (Suites Non Smoking Single King bed)

6) NHQQ - Double Queen Room Handicap Equipped (Non Smoking Handicap 2 Queen bed)

7) NHK - Single King Room Handicap Equipped (Non Smoking Handicap Single King bed)

8) SNHK - Single King Suites Handicap Equipped (Suites Non Smoking Handicap Single King bed)

Note: i) 1 Queen bed --> Bed = 1 | Max adults = 2
     ii) 2 Queen bed --> Bed = 2 | Max adults = 4

ref: https://www.comfortinnofblueridge.com/accommodations/rooms/

xi) Source : The room booking mode which is used by guest

1) CRS Central Reservation System (online mode) :15714 2) DIRECT Direct (direct mode) :9330 3) WI Walk In (direct mode) :2423 xii) CRS Conf No : N.A

xiii) GTD : N.A

Guaranteed --> Prior to a function, the figure given by a meeting planner to the property for the number of persons to be served.

VI    7565
MC    4808
CC    4704
DB    3661
CA    2985
GM    2786
AX     673
DS     255
DP      13
CK       9
4P       6

xiv) Reserve Date : Room reservation date (On this day, guest reserve/ book the rooms by entering details as arrival, depart, adults, bed, etc)

xv) User : Receiption desk person id which is responsible for handling the guest activities

xvi) Shared Account : Is any shared / same account used for booking.

xvii) Track Code : Purpose for the room booking

Corporate (Meeting, signing, dealing) Leisure (Rest) Walk-in (Other purposes on quick decision) xviii) Package : Complete package cost for the guest including per person charge, inclusive of guest room, food and beverage, and proportional percentage of all other event charges.

(Packages seasonal - Winter, Spring, Summer) xix) Cancellation Date : The date on which cancelation request initiated for booked/ reserved room.

xx) CXL User ID : Person Id (receiption desk person/ individual) who is responsible to cancle the booking/ reservation room.

B) Occupancy data :

i) Date : Date of the occupancy

ii) Day : Day associated with date

iii) Rooms : Total number of the rooms in hotel

iv) OOO : OOO (Rooms that are in Out Of Order) is typically used when a room is being renovated, undergoing repairs and cannot be used.

v) StayOver : The guest is not expected to check out today and will remain at least one more night or may be week.

vi) Arrivals : Displays all guests arriving on a certain day or within a certain set time period.

vii) DueOut : The room is expected to become vacant after the following day’s checkout time.

viii) Available : Total numbers of available rooms (-ve sign indicates overdue)

ix) Group Block : Booking the rooms in a group or package (Ex. For event, people book multiple rooms)

x) Group Picked Up : Booking the rooms in a group.

xi) TransNGTD : NA.

*GTD: Guaranteed. Prior to a function, the figure given by a meeting planner to the property for the number of persons to be served.

xii) TransGTD : NA.

*GTD: Guaranteed. Prior to a function, the figure given by a meeting planner to the property for the number of persons to be served.

xiii) Occupied : Total number of rooms are in used/ occupied (Guest is currently registered to the room). [Rooms - Available - GroupBlock + GroupPickedUp]

xiv) OccPercentage : Room coocupany rate on perticular day (occupied rooms / total available rooms). The occupancy rate will tell you exactly how full your hotel is at any point in time.

xv) RoomRev : Total revenue generated from occupied rooms on perticulaar day

xvi) RevPAR : Revenue Per Available Rooms (total revenue / total available rooms). This metric provides a glimpse into the number of rooms that are being sold at a hotel and how much revenue is being generated from those bookings. You should use RevPAR to understand the best way to maximise the revenue generated per room. If the RevPAR of your property is increasing, it must mean your average room rate or occupancy rate is increasing – or both!

xvii) ADR : Average daily rate of guest on perticular day (total revenue / rooms occupied). it is calculated using the amount of revenue earned and the number of rooms sold to give you an average rate. [Ref: https://www.siteminder.com/r/calculate-revpar/]

1) ADR will simply tell you how much revenue each sold room is selling for on average, while RevPAR will tell you how much revenue you’re bringing in for all your rooms.

2) Since ADR simply indicates the price of your rooms while RevPAR will tell you how much money you yield from each room, sold or not.

viii) Ppl : NA.

**Formulaes Details - Traditionally, the effectiveness of revenue management strategy is measured using the following KPIs

a) Occupancy rate — the number of occupied rental units at a given time, compared to the total number of available rental units at that time.

Occupancy rate = Rooms sold / Room available

b) Average daily rate (ADR) — this rate is applied to a room’s average rental income during a certain period. It’s compared to hotel’s historical ADR or competitors’ metrics.

ADR = Rooms revenue earned / Number of rooms sold

c) Revenue per available room (RevPAR) — a KPI that assess financial and business performance of a hotel. RevPAR measures ability of a property to fill all the rooms and define the best price for them. There are two ways to calculate it:

RevPAR = Rooms revenue / Rooms available
RevPAR = Average daily rate * Occupancy rate

d) Total revenue per available room (TRevPAR) — a metrics that accesses total revenue, generated by property and based on room cost and money spent on it. This KPI captures a snapshot of overall business performance. TRevPAR is one of the main benchmarking tools for big hotels and resorts.

The higher the TRevPAR, — the better the revenue.
TRevPAR = Total revenue / Total number of available rooms

e) Net revenue per available room (NRevPAR) — a KPI that allows hotel revenue managers to calculate the distribution cost to see how the room revenue is generated. NRevPAR includes spending on marketing and distribution.

NRevPAR = (Room revenue — distribution costs) / Number of available rooms

f) Gross operating profit per available room (GOPPAR) — measures the profit of a hotel and value of all assets at any given time. GOPPAR measures profit to capacity, including all a hotel’s spending and taxes.

GOPPAR = Gross Operating Profit / Number of available rooms

Ref: https://www.siteminder.com/r/calculate-revpar/ | https://www.upstay.tech/hotel-revenue-management-formulas-kpis-calculations-use-cases/

A) Library And Data Loading

A1 - Library importing

Importing the required libraries and supportive packages

A2 - Data loading from the source

Importing the data '.txt' from the source and save in '.csv' format

B) Data Preparation

As we have the raw data and need to analyze, prepare for business case studies. This stage involves data preprocessing, data cleaning and data understanding.

B1 - Reservation data

Data pre-processing steps:

i) Column rename

ii) Conversion of dataypes

iii) Replacenemt of unnecessary data values and EDA

iv) Missing value analysis

v) Date param extractions

vi) Data transformations and experiments

vii) Data insights

i) Reservation data size

Observation:

ii) Unique values in room types, rate codes, track codes

Observations:

In given property data,

iii) Analysis of room types

Observations:

iv) Analysis of rate codes

Observations:

iv) Analysis of track codes

Observations:

v) Analysis of nights (stay)

Observations:

B2 - Occupancy data

Data pre-processing steps:

i) Column rename

ii) Conversion of dataypes

iii) Replacenemt of unnecessary data values

iv) Missing value analysis

v) Date param extractions

vi) Data transformations and experiments

vii) Data insights and EDA

Observation:

B3 - Combination dataset

Observation:

We have 4 dataypes, and for machine learning, we need all inputs in numeric format strickly

Column Name     Datatype format
-------------------------------
Date            datetime
Year            int
Month           int
Days            int
Nights          int
Type            object
TrackCode       object
RateCode        object
Rate            float
Occupied        int
ADR             float

B4 - Yield Dataset

i) Load yield data

ii) Get yeild % values (based on check-in slot and occupancy level)

B5 - Added holiday - (Region = US)

Here, we are adding up the holidays in US region from python the package.

Package name           : holidays
Desciption             : US federal holiday calendar

ref: https://pypi.org/project/holidays/

Note: Initially we checking out (1) how much model learn/ capture this holiday effect? (2) And there is price change with this effect or not? (3) Holiday data is very very low as compared to actual data length, but with this initial data, we are trying to see is chance is happend or not?

$\color{Blue}{\text{,,}}$

i) US federal holiday calendar

Obsercation:

B6 - Date Features Extractions

Here, we are extracting Day of week, weekend, month name, day name, week name, etc... from Date column.

i) Extraction of data parameters

ii) Pre-processing data

iii) Validate data quality

iV) Analyzing the data and EDA

Note: Direct Date (datatype: datetimens) is not handle by ML model. Machine Learning algortihm expects data is in numeric form. Hence we are extracting requied fields from Date directly.

B7 - Event addition - (Region - US | Type - Sports)

Currently taking event data from web portal and save event data from date range 2017 to 2027 in excel file.

ref: https://www.timeanddate.com/holidays/us/super-bowl

Ex. (1) With reference link, in year 2022 below are the major sporting events.

13 Feb  Sunday    Super Bowl               Sporting event    
18 Apr  Monday    Boston Marathon          Sporting event    
6 May   Friday    Kentucky Oaks            Sporting event    
7 May   Saturday  Kentucky Derby           Sporting event    
21 May  Saturday  Preakness Stakes         Sporting event    
11 Jun  Saturday  Belmont Stakes           Sporting event    
6 Nov   Sunday    New York City Marathon   Sporting event


Note: Initially we checking out (1) how much model learn/ capture this effect? (2) And there is price change with this effect or not? (3) Event data is very very low as compared to actual data length, but with this initial data, we are trying to see is chance is happend or not?

C) Features Encoder

As Machien Learning algorithms expects all inputs must be in numeric format, hence in this step, we are converting non-numeric (Obejct) form to numeric (int) using AI techniques.

D) Modeling Part

In this section, we are applying Machine Learning algorithms to our data. The business case is related to prediction, hence applying ML Regression Algorithms and see the results on given data.

Input features : Year, Month, Days, Nights, Type, RateCode, TrackCode, Occupied

Output feature : Rate

Models : Linear model, Decision model, Boosting model

ML algorithms : LinearRegression, SVR, RandomForestRegression, GradientBoostingRegressor, XGBRegressor, KNeighborsRegressor

Matrics : MAE, RMSE, R2

Strategy - Considering 100% scale, about 70% data is use for model training purpose and remaining 30% data is use for testing purpose. So model is build on 70% of data and we will test the model (how model is being perform/ model behaviour) on 30% data which is unseen by model. This section is base model

ref: https://scikit-learn.org/stable/index.html

D3 - ML models and matric results

Models :

Cross Validation :

Accuracy measures :

Error measure :

Keypoints - For best model performance, below are the points to be consider for model selection as

ref: https://scikit-learn.org/stable/model_selection.html#model-selection

D4 - Model performanace comprision

We are comparing the model performaces, so that best model we can choose for tunning and predictions.

Table columns:

ref: (Cross validation) https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation

Observations:

Note: As top 2 models are XGBRegressor (83%) + RandomForestRgressor (83%). Previously we got 80% accuracy, here we improve accuracy about 3%. For finding best situable parameter, we need to perform hyperparameter tunning and using this we can enhance our model with best results.

E) Instance Balances Nature

Experiment on instance / data balancing and then approaching towards the modeling

ref: https://imbalanced-learn.org/stable/references/over_sampling.html

E1 - Method I - Data points balancing

Observation:

Observation:

E2 - Method I - Modeling part

E3 - Method II - Data points balancing

Observation:

Observation:

E3 - Method II - Modeling part

E4 - Method III - Data points balancing

Observation:

Observation:

E4 - Method III - Modeling part

E5 - Model performanace comprision (data balanced)

Observation:

We are comparing the model performaces, so that best model we can choose for tunning and predictions.

After data balacing mechanisms, we have seen the improvemtents in model performances as compared to the without effect of the balacing. But in this techniques there is chances of data points duplications and removals, as for balacing the data using AI techniques, different types of algorithms uses its internal procedure (Ex. targeting large values, distance based similarities, etc.).

E12 - Model performanace comprision (data balanced + duplicaions removal)

Observation:

We are comparing the model performaces, so that best model we can choose for tunning and predictions.

After data balacing mechanisms and duplicate datapoints/ records removal, we have seen the accuracy drop by 1 to 3 units and error increment by 1 to 3 units as compared to the pevious data balacing experiment.

Results without eliminating duplicates data points as,

        model    resample    trainingScore   testingScore   cvScore     maeRate     rmseRate
    9   RFR      SMOTEENN    0.989901        0.938760       0.946618    2.886929    6.538762
    1   RFR      SMOTE       0.979595        0.914869       0.918256    3.582198    7.786674
    5   RFR      ADASYN      0.979324        0.911031       0.913912    3.740118    7.994166
    10  KNN      SMOTEENN    0.944063        0.898050       0.912317    3.794717    8.436666
    2   KNN      SMOTE       0.919330        0.869398       0.878157    4.525902    9.644561
    6   KNN      ADASYN      0.916140        0.866234       0.872831    4.687299    9.802282
    8   XGB      SMOTEENN    0.897690        0.864710       0.869792    6.503276    9.718727
    0   XGB      SMOTE       0.877415        0.853478       0.857020    6.585655    10.215493
    4   XGB      ADASYN      0.869149        0.848388       0.848799    6.767652    10.435700
    3   GBR      SMOTE       0.751074        0.750621       0.747344    9.574245    13.327178
    7   GBR      ADASYN      0.745340        0.744134       0.744509    9.762927    13.556919
    11  GBR      SMOTEENN    0.743594        0.736216       0.739159    10.040380   13.570646

Observation:

Note: Hyperparam and model testing is pending...

F) Effect of holiday + event on prediction..

Let's analyze how is the model perform in combination of event and holiday in past data (testing dataset). So that we can know the effect of those combinations is work or not.

Note: Every time we need to pass new data index and new value manually which is present in testing data. (because we previously used shuffle method.)

F1 - Analysis using holiday / event effects on prices in model on model test dataset

i) Data seperation : holiday = 1 / event = 0

Holiday consideration (previous obs)

ii) Data seperation : holiday = 0 / event = 1

Event consideration (previous obs)

iii) Data seperation : holiday = 0 / event = 0

Business days consideration (previous obs)

iv) Data seperation : holiday = 1 / event = 1

Holiday and event consideration (previous obs)

Observation: On multiple time simulation in historical test data, below are the observation,

G) Hyperparam

Methods of tunning:

1) Grid search

2) Random Search

3) Bayesian optimizations

ref: https://www.anyscale.com/blog/what-is-hyperparameter-tuning

Note: Commenting this while section as it takes all available machine CPU cores (all physical cores), avaliable memory (above 98% RAM) and lots of time (multiple hours depending upon the parameter define, its all combinations and total passed data size)

H) Model Results

With param tuuning / without param tunning

I) User inputs

I1 - Data extraction based on user inputs

This section provides the simulation based on user inputs. User need to submit inputs and based on inputs, model is trying to predict the Room Rate.

I2 - Give user inputs here...

i) Current date param

I3 - Predictions for next days...

J) Representations

In this section, predictions visualization over date factor we can analyze.

i) Main chart - Prediction + Yield + Occupancy + RateCode + Day

Observation:

Observation:

ii) Monthly average prices - ML Predicted price + Yield based price

Observation:

iii) Day-wise average prices - ML Predicted price + Yield based price

Observation:

iii) Event and holiday stats wise avg - ML Predicted price + Yield based price

Observation:

Observation: